The motivations for this post are simple:

  1. There is a wealth of data on the COVID-19 pandemic.
  2. That data is misrepresented on a daily basis.

The most egregious sin is presenting counts of new cases without accounting for increased testing. This distorts comparisons between states (and countries) but also misrepresents temporal trends within them. I present the numbers both ways in the analysis below and also offer an interactive application for viewing the data.

In addition, I compare the experience of California and New York to offer some data-driven insight as to why the two states had such drastically different experiences.

National Data

New Cases

The chart below shows the number of daily new cases. The orange line is the 15-day trend. This chart suggests that we saw a modest reduction in new cases and are now poised to see an explosion. It’s alarming and has prompted renewed wall-to-wall news coverage of a “second wave”. It looks (incorrectly) like a tsunami.

The above graph has been presented in various forms by practically every news outlet. That’s unfortunate, because it’s misleading. The chart below presents the same data, but also includes the number of negative tests in orange. The second chart translates both numbers into the rate of new cases per 100 tests performed.

To be sure, the rate of new cases has begun rising in the past five days; however, these charts present a much less scary picture of reality. It is also the correct one. If five percent of the population is sick, doubling the number of tests doubles the amount of positive cases detected, but the rate of infection is unchanged.

It is also worth noting that the high rates of positive cases in April were likely the result of limited tests being reserved for people with COVID-like symptoms. As testing has increased, a more accurate measure of the infection rate has emerged.

Hospitalizations

Tracking daily hospital usage offers perhaps the best way to track the impact of COVID-19. In the chart below, each bar represents the number of people in the hospital on that day due to the virus. This metric fell from late April to early June, but it too has seen a rise in the past four days.

Deaths

Similar to hospital usage, COVID-19 deaths offer a metric that is easier to interpret without worrying about sample size issues (as with counts of new cases).

In the chart below, the 15-day trend smooths out the cyclical pattern and makes the decreasing trend clear. Deaths have fallen since late April; however, deaths are a lagging indicator. In a second wave scenario, you would expect to see hospitalizations go up first followed by rising deaths. The last bar on the chart (June 25th), is an odd data point and almost certainly a data reporting issue. The coming days will clarify.

National Summary

Perhaps the most important take away from the above review is that daily hospitalization rates are the best statistic to broadcast. They don’t require adjustment based on the amount of testing or lag behind as much as daily deaths.

Most media stories I’m seeing are misrepresenting the data by focusing on the absolute number of new cases. Without adjusting for the amount of testing, this is the worst metric to use.

State Data

While the nation as a whole is trending in the right direction, decisions to re-open are made state by state. Fortunately, we can look at the same metrics for each state. Additionally, the actions taken by different states may create natural experiments for us to examine.

In addition to the analysis I present in this post, I’ve also created an interactive viewer for the state data. Click the image below to open the app and choose which metric to view, see trends for each state, and compare states to each other.

New Case Rate

The table below shows the rate of new cases per day for a handful of states. Issues with the daily state data create some noise, but the different patterns are easy to see.

Death Rate

At the state level, comparing absolute deaths would be misleading. States like New York and California have more people, which means more deaths. As I did with new cases, I convert deaths in a rate: deaths per one million people.

Even with this adjustment, the chart below shows how severe the problem became in NY compared to other states.

Note: Some states like NJ and PA have single days with worse rates, but their daily data is noisy indicating reporting issues.

Regional Differences

The chart below shows the rates of hospitalization and death on a weekly basis colored by region of the country. Any state that experienced a weekly death rate above 50 per million is labeled.

Two primary patterns emerge:

  1. The separation of the northeastern states around New York City.
  2. Hospitalization and death rates are falling for nearly all states.

Note: The states not reporting hospitalization data run along the x-axis only

New York City

New York City is the densest city in the US, has the most air travel, and has transit ridership that dwarfs other cities. These factors make it a hotbed for viral transmission.

The map below shows the peak death rate experienced in each state. The impact of New York City can be seen in the surrounding states. Outside of the northeast, the peak death rates were much lower. Louisiana stands out in the south, which the CDC attributed to Mardi Gras.

Comparing to California

California is the most populous state, but it’s experience with the virus so far has been much less severe. Comparing California and New York reveals important similarities and differences that could explain the different experience. It’s also helpful that they initiated stay-at-home orders within a day of each other.

Density

New York City is the largest, densest city in the country. The table below shows the top 10 US cities ranked by population density (source). San Francisco is second on the list, but has only 1/10th the population. In fact, all the Californian cities in the table combined only hold ~1.4 million people. This disparity in population and density is a significant reason why NYC had a more severe outbreak.

City State Density Population
New York New York 28,317/sq mi 8,336,817
San Francisco California 18,569/sq mi 881,549
Jersey City New Jersey 17,848/sq mi 262,075
Paterson New Jersey 17,500/sq mi 145,233
Cambridge Massachusetts 17,289/sq mi 118,927
Daly City California 14,009/sq mi 106,280
Boston Massachusetts 13,938/sq mi 692,600
Miami Florida 12,599/sq mi 467,963
Santa Ana California 12,333/sq mi 332,318
Inglewood California 12,160/sq mi 108,151

Air Travel

Air travel is another important consideration for viral spread. The table below shows the top six US metro regions in terms of air travel (source). While NYC tops the list, Los Angeles and San Francisco are not far behind. Being on the east coast, NYC also attracts more European tourists, which is where their virus originated.

Metro Yearly Passengers Airport(s)
New York City 134,353,971 JFK, Newark, LaGuardia, Stewart, Long Island MacArthur, Westchester
Atlanta 104,171,935 Hartsfield–Jackson
Los Angeles 102,630,641 LAX, Long Beach, Bob Hope/Burbank, John Wayne, Ontario
Chicago 101,202,068 O’Hare, Midway, Rockford
Miami 80,054,002 Miami, Fort Lauderdale, Palm Beach
San Francisco Bay Area 75,966,974 San Francisco, Oakland, San Jose

Transit

Crowded buses and subways are an excellent place for viruses to spread. In my post on transit ridership trends, I used the chart below to show just how much larger the transit market is in NYC compared to the rest of the country.

Unlike the modest differences in air travel, NYC transit is 9.7 times larger than the largest market in California during usual operation. People crammed into enclosed spaces on the NYC city subway were primed to spread the disease.

As the pandemic spread, San Francisco closed their subway while Los Angeles cut back transit service after steep drops in ridership. NYC also saw reduced ridership and ran reduced service, but they only closed the subway between 1:00 am and 5:00 am,

The chart below updates my original 2019 chart with April 2020 data. NYC still recorded more than twice the ridership of Los Angeles.

Weather

Another key difference between California and New York is the weather. The chart below compares average temperatures in San Francisco and Los Angeles with New York City (source). California remained noticeably warmer than NYC through early April. There is still debate over the influence of weather on COVID-19, but some researchers think it could behave like it’s well-known coronavirus cousins.

What about Lockdowns?

Lockdowns are the first factor many consider when thinking about differences between states. Intuition might suggest that states with weak or no lockdown restrictions would have higher rates of death and positive tests. Instead, the comparison of California and New York show that other factors are more likely to impact severity.

The graphic below again shows hospitalization and death rates for all 50 states, but this time, they are color coded by lockdown duration (source). Northeastern states implemented longer lockdowns in response to a much more severe viral outbreak. Outside the northeast, states adopted different approaches and no clear pattern has emerged showing that longer/shorter durations made any meaningful difference.

Conclusion

Raw counts must be handled appropriately before making comparisons across time or geography. Further, the severity of the virus has varied across the country driven by underlying differences like density. Rural states without lockdowns had roughly the same mild experience as their peers who implemented lockdowns. Finally, a second wave does appear to be forming, which should be expected as states re-open. This fact alone does not make it clear what the right policy response should be. Trade offs between viral spread and mass unemployment are difficult to balance.